randomized policy
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Netherlands (0.04)
Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression
This paper develops a prediction-based prescriptive model for optimal decision making that (i) predicts the outcome under each action using a robust nonlinear model, and (ii) adopts a randomized prescriptive policy determined by the predicted outcomes. The predictive model combines a new regularized regression technique, which was developed using Distributionally Robust Optimization (DRO) with an ambiguity set constructed from the Wasserstein metric, with the K-Nearest Neighbors (K-NN) regression, which helps to capture the nonlinearity embedded in the data. We show theoretical results that guarantee the out-of-sample performance of the predictive model, and prove the optimality of the randomized policy in terms of the expected true future outcome. We demonstrate the proposed methodology on a hypertension dataset, showing that our prescribed treatment leads to a larger reduction in the systolic blood pressure compared to a series of alternatives. A clinically meaningful threshold level used to activate the randomized policy is also derived under a sub-Gaussian assumption on the predicted outcome.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
- Europe > Netherlands (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Competitive Algorithms for Multi-Agent Ski-Rental Problems
Wang, Xuchuang, Sun, Bo, Beyhaghi, Hedyeh, Lui, John C. S., Hajiesmaili, Mohammad, Wierman, Adam
This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting where agents incur individual and shared costs. In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all. We consider scenarios in which agents' active days differ, leading to dynamic states as agents drop out of the decision process. To address this problem from different perspectives, we define three distinct competitive ratios: overall, state-dependent, and individual rational. For each objective, we design and analyze optimal deterministic and randomized policies. Our deterministic policies employ state-aware threshold functions that adapt to the dynamic states, while our randomized policies sample and resample thresholds from tailored state-aware distributions. The analysis reveals that symmetric policies, in which all agents use the same threshold, outperform asymmetric ones. Our results provide competitive ratio upper and lower bounds and extend classical ski-rental insights to multi-agent settings, highlighting both theoretical and practical implications for group decision-making under uncertainty.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.50)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- Asia > China > Hong Kong (0.04)
- (3 more...)
- Transportation > Ground > Road (0.67)
- Transportation > Electric Vehicle (0.67)
- Automobiles & Trucks (0.67)
- Energy > Renewable > Solar (0.46)
Selecting Optimal Decisions via Distributionally Robust Nearest-Neighbor Regression
This paper develops a prediction-based prescriptive model for optimal decision making that (i) predicts the outcome under each action using a robust nonlinear model, and (ii) adopts a randomized prescriptive policy determined by the predicted outcomes. The predictive model combines a new regularized regression technique, which was developed using Distributionally Robust Optimization (DRO) with an ambiguity set constructed from the Wasserstein metric, with the K-Nearest Neighbors (K-NN) regression, which helps to capture the nonlinearity embedded in the data. We show theoretical results that guarantee the out-of-sample performance of the predictive model, and prove the optimality of the randomized policy in terms of the expected true future outcome. We demonstrate the proposed methodology on a hypertension dataset, showing that our prescribed treatment leads to a larger reduction in the systolic blood pressure compared to a series of alternatives. A clinically meaningful threshold level used to activate the randomized policy is also derived under a sub-Gaussian assumption on the predicted outcome.
A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent
Jafarnia-Jahromi, Mehdi, Jain, Rahul, Nayyar, Ashutosh
In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $O(\sqrt[3]{DS^2AT^2})$ by Wei et al. (2017) under the same assumption and matches the theoretical lower bound in $T$.
- North America > United States > California (0.14)
- Asia > Middle East > Jordan (0.04)
- Leisure & Entertainment > Games (0.46)
- Education > Educational Setting (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)
The Reinforce Policy Gradient Algorithm Revisited
We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not via the sample gradient), the algorithm converges to a neighborhood of a local minimum. We also provide a proof of convergence for this new algorithm.
- North America > United States > New York (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Queue Scheduling with Adversarial Bandit Learning
Huang, Jiatai, Golubchik, Leana, Huang, Longbo
In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions. We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization, without knowledge of the instantaneous network state (the arrival and service rates) of each queue. We then present two novel algorithms \texttt{SoftMW} (SoftMaxWeight) and \texttt{SSMW} (Sliding-window SoftMaxWeight), both capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies whose time-variation satisfies a mild condition. We further generalize our results to the setting where arrivals and departures only have bounded moments instead of being deterministically bounded and propose \texttt{SoftMW+} and \texttt{SSMW+} that are capable of stabilizing the system. As a building block of our new algorithms, we also extend the classical \texttt{EXP3.S} (Auer et al., 2002) algorithm for multi-armed bandits to handle unboundedly large feedback signals, which can be of independent interest.
- Energy > Power Industry (0.67)
- Transportation > Infrastructure & Services (0.67)
- Information Technology (0.67)
- Transportation > Ground > Road (0.45)
Distributionally Robust Learning
Chen, Ruidi, Paschalidis, Ioannis Ch.
This monograph develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. Beginning with fundamental properties of the Wasserstein metric and the DRO formulation, we explore duality to arrive at tractable formulations and develop finite-sample, as well as asymptotic, performance guarantees. We consider a series of learning problems, including (i) distributionally robust linear regression; (ii) distributionally robust regression with group structure in the predictors; (iii) distributionally robust multi-output regression and multiclass classification, (iv) optimal decision making that combines distributionally robust regression with nearest-neighbor estimation; (v) distributionally robust semi-supervised learning, and (vi) distributionally robust reinforcement learning. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining bounds on the prediction and estimation errors of the solution. Beyond theory, we include numerical experiments and case studies using synthetic and real data. The real data experiments are all associated with various health informatics problems, an application area which provided the initial impetus for this work.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- (3 more...)